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ABSTRACT 

This large-scale field experiment examined the 
potential of various training and supervision programs to affect the 
performance of health survey interviewers and the quality of data 
they collect. It was found that interviewers who received less than 
one day of basic training generally displayed inadequate interviewing 
skills. A program of tape recording as part of the supervision of 
household interviewers was associated wivh more precise and less 
bxased data if interviewers were more than minimally trained. 
Training and supervision were found not to be compensatory but, 
rather, to interact so that if either was inadequate the data were 
adversely affected. The results also point to the value of designing 
questions to minimize the need for probing, a significant source of 
interviewer effects, and the value of procedures to communicate the 
importance of accuracy to respondents. Overall, attention to a 
variety of aspects of interviewer management — their training and 
supervision^ the design of questions, the procedures they are to use, 
and the size of their assignr .nts — are cost-effective ways to improve 
the quality of survey-based estimates. (Author) 
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Foreword 



This report is a simmary of research corxlucted by Floyd Fowler and Thomas 
Mangione on the costs and benefits of increased interviewer training and 
supervision. It fills a major void in current knowledge about the 
magnitude and nature of non- sampling error in health surveys. Their study 
is one of the first to provide more than anecdotal evidence about: (1) the 
optijnum length of interviewer training, (2) the best method of supervision 
and (3) the implications of the size of interviewer assignments. In 
addition, it adds to previous knowledge about questionnaire design and 
wording. 

The major value of this report lies in the set of practical recommendations 
that are made to improve the quality of health survey data. They 
include: (1) increase interviewer training beyond one day, (2) write 
questions to minimize the need for interviewer probing, (3) tape-record all 
interviews or a sample of interviews for si;qpervisory review, and (4) reduce 
the size of interviewer assignments. All four strategies for reducing 
interviewer effects are quite cost-effective relative to increasing sample 
size, the most coninon approach to increasing the precision of survey 
estimates. 

Since this study shows that the relationship between the length of training 
and data quality is not linear, the finding that one day of training is not 
adequate should not be interpreted to mean that "more is better." A 
complex interaction between length of training and mode of supervision was 
found that suggests that too much training may even be counter-productive 
without intensive supervision. In addition, it must be recognized that the 
optimun length of trainir^g is a very study- specif ic issue that will depend, 
to a large degree, on the complexity of the instrument. 

An especially valuable contribution is made from the finding th t taping 
interviews is a very cost-effective alternative to the usual me :iiod of 
direct supervisory observation. This finding should have a significant 
impact on the conduct of future health surveys. Finally, for the same 
level of precision, smaller interviewer assignments and hence larger staffs 
would be less costly than fewer interviewers taking larger assignments. 

The results of this study should enable survey planners and researchers to 
make more informed decisions concerning the tradeoffs that affect data 
quality in health surveys. By presenting clear and easy- to- implement 
methods of reducing survey costs while increasing data quality, Fowler and 
Mangione have made an important methodological contribution to the health 
services research community. 
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ABSTRACT 

This large-scale field experiment examined the potential of various 
training and supervision programs to affect the performance of health survey 
interviewers and the quality of data they collect. It was found that 
interviewers who received less than one day of basic training generally 
displayed inadequate interviewing skills. A program of tape recording as part 
of supervision of household interviewers was associated with more precise and 
less biased data if interviewers were more than minimally trained. Training 
and supervision were found not to be compensatory but, rather, to interact so 
that if either was inadequate the data were adversely affected. The results 
also point to the value of designing questions to minimize the need for 
probing, a significant source of interviewer effects, and to the value of 
procedures to communicate the importance of accuracy to respondents. Overall, 
attention to a variety of aspects of interviewer management - their training 
and impervision, the design of questions, the procedures they are to use, and 
the size of their assignments -- are cost-effective ways to improve the 
quality of survey-based estimates. 
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BACKGROUND 

Many areas of social policy rely heavily on data developed by survey 
research techniques for planning and evaluation purposes. Obviously, the 
quality of estimates derived from surveys is of considerable importance in the 
effective pursuit of social policy goals. 

Methodologists often cite three sources of error, or reasons why figures 
derived from a sample survey do not accurately describe the true values for 
the populations from which they were drawn: 

A. Sampling error, the normal chance variability that occurs because a 
sample may differ within a calculable range from the population from which it 
was drawn. 

B. Nonresponse error, error resulting from the fact that data are not 
collected from every population member chosen to be in a sample. 

C. Measurement or response error, error stemming from the fact that 
answers to questions do not perfectly meapu''^, what the researcher was trying 
to measure. Factors that affect response e^.or include problems with the 
questions and the way they are designed, limitations on a respondent's ability 
and willirgness to answer questions accurately, and problems with the way that 
an interviewer handles the question and answer process. 

Current thinking about the design and execution of surveys emphasizes 
"total survey design". By this me thodologists mean that researchers should 
take into account all potential sources of error when designing a survey or 
evaluating survey data. Although such a view may seem only reasonable, for 
practical reasons it has not been common practice in the past. 

Sampling error has long been a concern of sampling statisticians. The 
limits on the confidence one can have in estimates f**om a random sample of a 
particular size and design can be readily calculates. Most reports of survey 
estimates include some acknowledgement of the role of sampling e.rror in the 
precision of the figures and usually attach some numerical estirrates of 
sampling error. 

The significance of nonresponse for survey estimates is also conimonly 
acknowledged. Although the effect of nonresponse on survey estimates seldom 
can be calculated very well, most researchers attempt to achieve a respectable 
response rate and they commonly report the rate of response. 

In contrast, other aspects of the data collection that directly affect 
the quality of the measurement process ere frequently osrerlooked entirely both 
in the design of surveys and in the reporting of the data collection. For 
example, the amount and kind of effort that went i'^to the development of the 
questions, and evidence for the validity of the answers, are only rarely 
reported. Most germane to the current topic, however, although their 
importance is well documented (e.g. Hyman, 1954), consideration of the 
interviewers and the quality of interviewing in a survey is typically almost 
totally ignored. 
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Interviewers have three different roles to play in a sample survey. 
First, they are the ones who implement the sample design. How they do that 
affects response rates and costs. 

The second role of the interviewer is to train and motivate respondents 
to do their part in the interview. 

The third job of the interviewer is to handle his or her side of the 
question and answer process. Specifically, the interviewer asks the 
questions, clarifies questions as needed, stimulates the respondent and 
directs his or her effort in the event that an initial answer does not meet 
question objectives, and records the answers. 

We know from previous research that interviewers can influence the 
quality of estimates in two different ways. First, if interviewers are not 
consistently Gtandardi?:ed, survey-based estimates are less precise than they 
otherwise would be for a sample of a given size. In essence, lack of 
standardization increases the amount of random error around the survey 
estimates and decreases the extent to which true differences among respondents 
are detectable in their answers. 

Interviewers can also systematically bias data or make them less valid. 
Cannell (1977a and 1977b) has demonstrated that the pace at which an interview 
is conducted, the kind of respondent behaviors that interviewers reinforce 
during the interview, and the goals that interviewers communicate to 
respondents can all be related to the accuracy with which respondents report. 

Although methodological studies have left little doubt that interviewers 
have a role to play in the quality of survey estimates, in a typical survey, 
the effects of the interviewers cannot be dissociated from other sources of 
error. Except for the response rate, for which interviewers have some 
responsibility, the effects of poor interviewing typically are not observable 
in data. Moreover, researchers have not had good information about the costs 
and potential benefits of various strategies to improve the quality 
interviewing. This lack of information made it difficult for researchers 
realistically to consider trade-offs between the quality of interviewing and 
such design decisions such as the size of the sample or the response rate. 

The research reported here was an explicit attempt to provide the 
information that researchers need about the role of interviewers in affecting 
survey data and the benefits of various options available to researchers to 
improve the quality of survey data. 

OVERVIEW OF METHODS 

This study was designed to assess the value of various realistic options 
for training and supervising interviewers for improving the quality of urvey 
data. The study also was designed to provide data about the properties of 
questions that make them susceptible to interviewer effects. 

Four training programs were examined. The shortest program lasted only 
about half a day. The longest training program tested was ten days long, which 
is considerably more intensive and extensive than any training program 
routinely used by survey research organizations. The other two programs, 



ERLC 



8 



- 4 - 



lasting two and five days respectively, are typical in length to training 
programs commonly in use. 

With respect to supervision, three levels were tested. The minimal 
program provided interviewers with only feedback on costs and response rates. 
Level II added review of a sample of completed interview schedules; Level III 
involved tape recording all interviews and providing feedback on interviewing 
techniques, as well as costs, response rates, and the quality of completed 
interviews . 

In all, 57 newly hired iaterviewers were randomly assigned in a balanced 
design; first to one of the four training programs, then to one of the three 
supervision programs. They carried out a special purpose health interview; on 
a\erage they each took 26 interviews. 

There were four features of the experiment that we considered essential 
to its value and success. First, this large- scale project was designed to 
have enough power to detect real effects and reach defensible conclusions. 
Second, in order to permit study of the effects of interviewers on data and 
evaluate the quality of their work, it was necessary to dissociate 
interviewers from idiosyncrasies of their sample. To do this, each 
interviewer's sample of respondents was representative of the sample 
population as a whole. Third, it was important to generalize about question 
form and content and how they relate to interviewer effects. Therefore, the 
health survey questionnaire was carefully constructed to include an array of 
common questions, as well as a sample of various types of questions. 

Fourth, we wanted to know not only whether or not the experimental 
training and supervisory programs affected data, but also to understand the 
whys, and to gain a more general understanding of the role of the interviewer 
in the data collection process. In order to do this, two special additional 
data collection activities were built into the project: 1) all respondents 
who were in the survey were also re interviewed to gain information about the 
respjndent's reaction to the survey process and to the interviewer; 2) 
information was collected directly from interviewers after they were finished 
with their assignments about their perceptions of the job and of the interview 
process . 

These data, in combination with information derived from coding the tape 
recorded interviews taken by Supervision Level III interviewers, provided a 
unique opportunity to study what interviewers actually do and how they affect 
data. 

THE EFFECTS OF TRAINING AND SUPERVISION 

The central hypothesis tested in this study was that more extensive 
programs of training and closer supervision of interviewers would improve the 
quality of data that they collected. To examine this hypothesis, two measures 
of data quality were created. 

First, one goal of good survey interviewing is that interviewers be 
standardized; that is, that they do not affect the answers that they obtain. 
The effect of lack of standardization is a reduction in the precision of 
survey estimates. A measure of the extent to which answers can be predicted 
by knowing the interviewer, and hence one can infer answers were affected by 



9 



5 



^^LJ?''®''"'^®''®''' intraclass correlation, as oroposed by Kish 

( 1962 ) . 

A.^^.^^ validity of data, in the absence of a credible criterion, is more 
difficult to evaluate. However, there are some questions for which we are 
able to make a good guess about the direction in which reported data are most 
likely to differ from the true answers. One such kind of question involves 
reporting the number of events occurring over a period of time. For many such 
questions, underreporting has been documented. Another class of questions 
those that may have socially desirable answers, have been shown to be prone to 
error; overall people tend to err in the direction of making their answers 
more attractive or socially acceptable than the true value. Based on these 
premises regarding patterns of error or Sias, survey questions in the study 
were selected for which a direction of "better", or less biased data could be 
specified. Because each interviewer's sample of respondents was a random 
subsample of the total sample, any differences between the average answers 
given by an interviewer's respondents and those given by other interviewers' 
respondents could be attributed to the effects of the interviewer rather than 
to sample differences. 

Tables 1 and 2 present the results of analyses of variance looking at the 
relationship between the training and supervision programs to which 
interviewers were assigned and the values of these two measures of the quality 
ot data they collected. A number of important obse. -vat ions arise from these 
two tables. 

1) Training and supervision do matter. For both measures of quality 
the effect of the combination of training and supervision received was 
statistically significant. 

2) Tape recording interviewers in order to provide direct supervision 
ot the way they handle the question- and- answer process in the interview 
improves the precision of survey estimates and, if Interviewers have had more 
than minimally adequate training, also probably improves the validity of the 
data they collect. 

3) Those who received the most training and were tape recorded were as 
a group, the best interviewers overall. Their data were significantly less 
biased than the other groups, and their level of standardization was ec.ual to, 
or better than, any other group. 

4) There is a complex interaction between training and supervision 
Although we hypothesized that they might be complementary, with, for example 
more training compensating for minimal supervision, or vice versa our 
findings are quite different. Instead we found that if either was inadequate 
the quality of the data was diminished. Specifically, well trained but poorly 
supervised interviewers were among the worst performers on both quality 
dimensions, while intensive supervision of the least trained interviewers did 
improve their degree of standardization but also produced data that were more 
biased. 

The study was designed not only to find out whether training and 
supervision affected data, with an eye to setting some minimal standards in 
those areas , but also to unC^is'caM better the ways in which training and 
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TABLE 1 

AVERAGE RHO (XIOOO) BY LEVEL OF 
TRAINING AND SUPERVISION* 



Length of 
Training Program 

< 1 day 

2 days 
5 days 
10 days 



Supervision Level 



Leve 1 I 
14 
12 
9 
15 



Level II 
10 

6 

9 
20 



Level III 
8 
11 
8 
7 



Average 
11 

9 

9 
12 



Average 



12 



10 



8 



10 



Analysis 
of Variance 


df 


Sum of 
Squares 


F 


P 


Training 


3 


22.19 


1.94 


.11 


Supervision 


2 


14.60 


1.92 


.15 


Interaction 


6 


47.67 


2.09 


.05 



Mean Square Model 7.68 

df 11 

Mean Square Error 3.81 

df 648 

F 2.02 

P <.05 



Contrasts 

Supervision Level III (taped) vs. I & II (not taped) t - 1.6 
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Analysis includes only items for which interviewers affected the answer 
(p<.10 by F test). Rho was transformed to Log Rho prior to analysis to 

1-Rho 

more closely meet assumptions of normality. 

**Probabilties indeterminate because multiple contrasts were ran, but t 
values meet or exceed usual values for 1- tailed test of signiiicance (P<.05) 
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TABLE 2 

AVERAGE STANDARD SCORE (XlOO)* ON QUESTIONS 
JUDGED MOST LIKELY SUBJECT TO SYSTEMATIC BIAS 
BY LEVEL OF TRAINING AND SUPERVISION 



Length of 
Training Program 

< 1 day 
2 days 
5 days 
10 days 

Ave rage 



Supervision Level 



Le ve 1 I 
18 
7 
-8 
0 
4 



Level II 
11 
-5 
-5 
21 
5 

******* 



Level III 
-26 
43 
7 
52 
20 



Average 
1 
15 
-2 
27 
10 



Analysis 
of Variance 

Training 

Supervision 

Interaction 



df 
3 
2 
6 



Sum of 
Squares 

.49 

.18 

1.06 



F 

2.44 
1.36 
2.64 



.06 
.26 
.01 



Mean Square Model .157 

df 11 

Mean Square Error .067 

df 3623 

F 2.35 

P <.01 



Contrasts 

Taped vs. not taped (excluding 1-day training) t = 1.77 

10-day training and taped vs. rest t - 1.92 



ERIC 



A positive score is a score judged to be less biased. 
** 

Estimates were adjusted for the fact of multiple measures per interviewer. 
Probabilities indeterminate, because multiple contrasts were run, but t values 
exceed usual values required for 1-tailed test of significance (P<.05). 
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supervision matter for interviewers, and the ways in which interviewer 
behavior affects data. 

One analysis related interviewer training to the specific quality of 
their interviewing skills. In particular, we looked at the quality of asking 
questions as worded, the quality of probing, and recording answers 
appropriately, and maintaining a neutral interpersonal relationship. For the 
most part, our conclusions were based on coding of tape recorded interviews. 

VThen we looked at these basic interviewing skills in relationship uo 
training, it vas quite evident that giving interviewers' less thrn one day of 
training resulted in inadequate interviewing skills compared to the other 
interviewers (Table 3). These are highly sig^nificant effects. Beyond that, 
however, the increments in basic interviewing skills demonstrated by 
interviewers with increasingly lovg training programs tended not to be 
significant, with the exception of probing skills. 

One might have thought that an effect of intensive supervision of 
interviewing behaviors would be to improve interviewer skills over time. This 
proved not to be the case. In fact, the basic interviewing skills of 
intensively supervised interviewers did not improve in the second half of 
their worlc compared to their first half. However, we did find evidence that 
skills of interviewers who were not tape recorded deteriorated in the second 
half of their assignments. 

Thus, these data provide a basis for understanding the results of Tables 
1 and 2. Interviewers need at least a couple of days of training to gain 
skills that prodtice standardized data collection. Intensive supervision 
through review and feedback from taped interviews maintains these skills that 
produce standardized data. Interviewers who were not intensively supervised 
showed deterioration in skills, particularly among those who acquired higher 
skill levels to begin with. 

However, intense supervision did not create skills that were not acquired 
from training. Intense supervision of poorly trained interviewers produced 
nervous interviewers who gathered biased data. It is most important to note, 
however, that those who received the most training and were intensively 
supervised gathered the best data on both dimensions of quality. 



Although the study was organized to look at the way training aiid 
supervision affect interviewer behavior and data quality, there are at least 
three other aspects of interviewer management that we came to appreciate as 
being important in reducing the contribution that interviewers make to data 
error. First, procedures that interviewers are asked to use in carrying out 
the interview affect quality. Second, the way questions are designed affects 
the likelihood that interviewers will affect the answers. Third, the size of 
the interviewers' assignments affects the potential impact of interviewers on 
the quality of data. 

Interviewer Procedures 

In the field of survey research, there is a commonly accepted set of 
guidelines for Interviewers regarding asking questions as worded, probing in a 
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TABLE 3 

SELECTED MEASURES OF INTERVIEWER BEHAVIOR 
FROM C0D1>;G taped INTERVIEWS BY TRAINING PROGRAM 
(SUPERVISION LEVEL III ONLY) 



Length of Training Progr 



am 



Interviewer Behaviors 
from Tape Coding < 1 day 2 days 5 days 10 days P 

Average No. Questions 
Read incorrectly/ 

Interview 21 7 14 6 <.01 

Average No. of Directive 
Probes/Interview 8 5 5 3 < 01 

Average No. of Times 
Failed to Probe Inadequate 

Answers/Interview 8 6 5 5 <.01 

Average No. of Inaccurate 
Recording of Closed Ques- 
tion Answers/Interview 1 1 1 * .05 

Average No. of Inaccurate 
Recording of Open Question 

Answers/ Interview 4 2 2 2 < 01 

Average No. of Instances 
of Inappropriate 

Feedback/Interview 2 * * * < 01 

Percentage of Interviews 

Rated Excellent or Satisfactory 

Reading Questions 30 83 72 84 <.01 

as Worded 

Probing Closed Questions 48 67 72 80 <.01 

Probing Open Questions 16 44 52 69 <.01 

Recording Answers to 
Closed Questions 88 88 89 93 .74 

Recording Answers to 
Open Questions 55 80 67 83 <.01 

Non-biasing Inter- 
personal Behavior 66 95 85 90 <.01 

* Less than 0.5 times per interview. 
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nonbiasing fashion, not influencing the answers through the recording process, 
and being interpersonally neutral. Our analyses support the notion that good 
question reading and non- directive probing are basic skills for reducing 
interview<ir effects. 

With respect to bias, if we were co advocate one characteristic of an 
interviewer for obtaining good data, it would be that he or she conveys to the 
respondent that the accuracy of the data is important. Interviewers whom 
respondents rated as being most concerned about accuracy produced 
significantly less biased data. Specific ways that interviewers can 
communicate the importance of accuracy include the pace at which the interview 
proceeds and being attentive to the interviewer's own role, by asking 
questions exactly as worded and probing carefully. 

In addition, there were some indications in our data that being friendly 
and relatii g to respondent needs may play some role in the accuracy of data 
collected. There was a correlation between respondent -rated friendliness and 
our measure of bias among the tape recorded interviewers (r ^ .31). 

In terms of how to achieve these goals, in addition to simply telling 
interviewers what to do, our findings tended to reinforce the salience and 
relevance of the techniques Cannell (1977b) tested - giving intisrviewers 
specific instructions in how to train respondents and reinforce accurate 
reporting as a goal. 

Question Design 

As others have found, it was apparent that interviewers had more trouble 
wich some questions than others. Our results were consistent with results 
reported by Groves and Kahn (1979) showing a fourth to a third of survey items 
were subject to significant interviewer effects. 

The specific questions that were most subject to interviewer effects were 
the ones that interviewers had to probe frequently. When interviewers are 
required to probe in order to obtain an adequate answer, it produces 
opportunities for them to be inconsistent, to use directive probes, or to fail 
to probe inadequate answers. This finding is of considerable practical 
significance because better and more systematic pretesting of survey questions 
can identify problem questions. By rewriting these items we can probably 
reiuce the need for probing and thereby increase the precision of estimates. 

The most pervasive hypothesis in the literature about item types is 
probably that sensitive questions may be most subject to interviewer effects. 
We found just the opposite, with respect to precision of estimates (not bias 
itself) which intriguingly is similar to findings reported by Bradburn and 
Sudman (1979). We found that interviewers were more consistent in the way 
they handled sensitive questions than they were for the average question. 
This led to lower interviewer effects for sensitive questions. 

Size Of Interviewing Staff 

Another dimension of a study design to wnich a researcher could attend in 
order to reduce error is the number of interviews taken by an interviewer. 
The total effect of interviewers on the precision of estimates in a study is a 
product of the intraclass correlation for interviewers and the average number 
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of interviews taken per interviewer. For a given level of intraclass 
correlation, smaller numbers of Interviews per interviewer (and hence a larger 
interviewer staff) , vill produce more precise data. 

In addition, we have suggestive evidence that the quality of data 
deteriorates as more interviews are taken by an interviewer. This is similar 
to results reported by Cannell (1977a). 

Thus, from the point of view of reliability, as reflected in intraclass 
correlations and from the point of view of validity, it seems as if using more 
interviewers , thus having each interviewer take fewer interviews on average , 
is a constructive way to improve the quality of survey data. This 
recommendation, of course, assumes a similar level of supervision and 
training. 



Good methodology has been defined as designing a study to get the most 
precision or accuracy for a dollar. To do this, one should attend to the 
various features of any data collection enterprise that affect the data that 
result. 

For too long, only the calculation of sampling errors and response rates 
has passed for methodological rigor. Features of survey design that may be 
equally or more important in the overall quality of survey estimates, notably 
survey question design and interviewer performance, often are ignored. 

Of course, it is easy to understand why these considerations can be 
ignored; the effects of poor question design and of poor interviewing are not 
immediately apparent wVien data are analyzed. It takes special calculations to 
find out how well the measurement process has actually been carried out. Yet, 
the fact that error is not obvious does not mean the error is not there. 

There are a good number of questions (probably more than half in most 
health surveys) that are relatively unaffected by interviewers. However, 
about a fourth to a third of the questions in representative health surveys 
are significantly influenced by the quality of interviewing. These are not • 
obscure or unimportant questions. Some commonly used questions which were 
subject to significant interviewer effects include: 
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Rho 



Are you limited in any way because of a disability 
or health condition? 



.014 



In the past 12 months, did you have hemorrhoids or piles? 



.017 



In the past 12 months, did you have deafness in one or both ears? 



.020 



How many dnys in the last month would you say you had 
(USUAL NUMBER) of drinks? 



.034 



How long ago was the last time you were actually seen by a doctor 
about your health - within th e last month . 1 to 6 months ago . 
6 montbs to a year agp or more than a year ago ? 



.037 
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When this project began, we knew that interviewers had a largely 
unappreciated role in the creation of survey error. Moreover, we knew that 
there were very few practical guidelines for researchers and for those who 
would purchase research for how to minimize the effects of interviewers on 
survey data. In the preceding pages, we have outlined five very practical and 
useful ways, in addition to increasing sample size and minimizing nonresponse, 
by which researchers can improve their estimates by improving the way that 
interviewers do their jobs. Although the value of stressing accuracy to 
respondents emerges from our data, the techniques for doing this were 
developed in Cannell's work (e.g. 1977b). However, this project clearly 
supports the value ot four additional strategies aimed at improving survey 
estimates by reducing interviewer effects on data. 

Table 4 summarizes these four strategies, along with the most common 
approach to increasing the precision of survey estimates, increasing the size 
of the samples. 

1. Give more than minimal training in basic interviewing skills. 

2. Make tape recording, review, and feedback a standard part of 
supervision. 

3. Try to design questions to make asking them easy and reduce the need 
for probing, and pretest them thoroughly to make sure the attempt is 
successful . 

4. Reduce the number of interviews taken per interviewer. 

Of course, the cost effectiveness of the steps outlined in Table 4 and 
how much estimates will be improved will vary from setting to setting and 
estimate to estimate. However, for that important subset of items that 
interviewers significantly influence, steps to reduce interviewer effects are 
quite cost effective ways to improve the quality of estimates. We believe 
these data provide practical guidelines for researchers for improving their 
survey data by attending to the quality of interviewing. Standards for the 
way interviewers are managed have been too long absent, despite a history of 
research showing that interviewers matter. We are hopeful that a concrete 
effect of this project will be to help bring attention to the interviewer as 
part of the total design of surveys to the status it deserves. 
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TABLE 4 

FIVE WAYS TO DECREASE STANDARD 
ERRORS OF ESTIMATES 



Sample Size l 



Interviewer Training ^ 



Tape Supervision ^ 



Question Design ^ 



Niimber of Interviews 
Per Interviewer^ 



Approach 

Increase effective 
sample size by 
about 20% 



If interviewers 
receive less 
than 1 day of 
basic training, 
increase by a day 
or two 



Tape all or a 
sample of interviews, 
review one a week 
per interviewer 
provide feedback 



Rewrite questions 
to reduce need for 
probing and make 
administration 
and reading of 
questions easier 



Reduce assignment 
size by 20% by 
using 20% more 
interviewers 



Likely Cost 

About a pro rata 
increase in data 
collection and 
data reduction 
costs 

Equivalent to 
about 12 hours 
of interviewer 
wages per extra 
training day 
per interviewer 



About 2 hours/ 
interviewer per 
week 



Effect on 
Standard Errors 

Decrease 
by 10% 



About twice the 
length of the 
interview to 
tabulate inter- 
viewer behavior 
from taped pretest 
interviews plus 
time to rewrite 
questions 

Difficult to 
estimate but 
certainly less 
than changing 
sample size to 
produce same effect 



Decrease by 
10% for the 
1/3 of survey 
items which 
'^re most 
affected by 
interviewers 

Decrease by 
more than 
10% for 1/3 
of items most 
affected by 
interviewers 

Efficacy 
not yet 
demonstrated 
but data sug- 
gest notice- 
able gains 
likely 



Decrease by 
10% for 1/3 
of items 
most affect- 
ed by inter- 
viewers 



NOTES : 



1. 
2. 



ERIC 



If complex design rather than simple random sample, may entail more than 20 
percent more interviews. 

Clearly produces direct effects on standard errors only beyond minimal 
training. However, even more training may also pay off in decreasing bias 
in data. 

Probably even greater benefits over time as interviewers deteriorate 
without taping and feedback. Also, significantly reduces bias for 
adequately trained interviewers. 

Probably also reduces bias through reduced burn out. 
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APPENDIX 

Calculation of Rho (Interviewer Effe ^ts) 

One common barrier to attending to interviewer effects is the lack of a 
measure. When interviewers are assigned to samples purely on the basis of 
proximity (or some similar nonrandom criterion) rho cannot be meaningfully 
calculated. However, if interviewers have assignments that are random subsets 
of the whole sample, then interviewer effects can be calculated for each item 
in a survey. Also, for assignments that are roughly random (e.g. in 
centralized telephone facility studies), these calculations provide imperfect 
but useful estimates of interviewer effects. 

A second barrier to these calculations is the availability of appropriate 
software. There are specialized programs to calculate interviewer effects, 
usually variations on programs designed to calculate sample design effects, 
but they are not available in many computer facilities. However, a standard 
analysis of variance program for Generalized Linear Models can be used to 
accomplish the same thing with reasonable precision. 

For each interval or ratio scale variable run an analysis of variance 
with interviewer as the random effects variable. For nominal scale data, 
translate answers to dummy (i.e. 1/0) variables and use these as the dependent 
variables in the ANOVA. From the analyses of variance statistics note the 
Model Mean Square and the Error Mean Square terms and the average number of 
interviews per interviewer. 



Model MS - Error MS 



Rho - n 

Error MS + Model MS - Error MS 

n 



Calculation of Design Effects on 
Size of the Standard Error 

Design Effect = aJi + r]^ (n - 1) 

n * average number of interviews per interviewer 



If the number of interviews is reasonably similar, a simple average may be 
used. If they range widely, a more complex calculation is needed (see Groves 
and Magilavy , 1980) . 

Table Al below provides sample calculations of design effects (defts) for 
various values of rho and interviewer assignment sizes. 
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TABLE Al 

MULTIPLIERS OF ESTIMATES OF STANDARD ERRORS OF MEANS 
DUE TO INTERVIEWER EFFECTS* FOR SELECTED VALUES OF 
RHO AhD AVERAGE INTERVIEWER ASSIGNMENTS 



Average Intraclass Correlation (Rho^ 

Interviewer 



sicnment Size 


.005 




01 


.015 




02 


■ 03 


11 


1.002 


1 


.05 


1.07 


1 


.10 


1.14 


21 


1.05 


1 


.10 


1.14 


1 


.18 


1.26 


31 


1.07 


1 


.14 


1.20 


1 


.26 


1.38 


51 


1.12 


1 


.22 


1.32 


1 


.41 


1.58 


81 


1.18 


1 


. 34 


1.48 


1 


.61 


1.84 


101 


1.22 


1 


.41 


1.58 


1 


.73 


2.00 



*Estimates of standard errors calculated from the sample size and design 
should be inflated by the multiplier in the table to take into account the 
effect of interviewers. 
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